Accurate viral population assembly from ultra-deep sequencing data

نویسندگان

  • Serghei Mangul
  • Nicholas C. Wu
  • Nicholas Mancuso
  • Alex Zelikovsky
  • Ren Sun
  • Eleazar Eskin
چکیده

MOTIVATION Next-generation sequencing technologies sequence viruses with ultra-deep coverage, thus promising to revolutionize our understanding of the underlying diversity of viral populations. While the sequencing coverage is high enough that even rare viral variants are sequenced, the presence of sequencing errors makes it difficult to distinguish between rare variants and sequencing errors. RESULTS In this article, we present a method to overcome the limitations of sequencing technologies and assemble a diverse viral population that allows for the detection of previously undiscovered rare variants. The proposed method consists of a high-fidelity sequencing protocol and an accurate viral population assembly method, referred to as Viral Genome Assembler (VGA). The proposed protocol is able to eliminate sequencing errors by using individual barcodes attached to the sequencing fragments. Highly accurate data in combination with deep coverage allow VGA to assemble rare variants. VGA uses an expectation-maximization algorithm to estimate abundances of the assembled viral variants in the population. RESULTS on both synthetic and real datasets show that our method is able to accurately assemble an HIV viral population and detect rare variants previously undetectable due to sequencing errors. VGA outperforms state-of-the-art methods for genome-wide viral assembly. Furthermore, our method is the first viral assembly method that scales to millions of sequencing reads. AVAILABILITY Our tool VGA is freely available at http://genetics.cs.ucla.edu/vga/

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Ultra-deep sequencing for the analysis of viral populations.

Next-generation sequencing allows for cost-effective probing of virus populations at an unprecedented level of detail. The massively parallel sequencing approach can detect low-frequency mutations and it provides a snapshot of the entire virus population. However, analyzing ultra-deep sequencing data obtained from diverse virus populations is challenging because of PCR and sequencing errors and...

متن کامل

De novo meta-assembly of ultra-deep sequencing data

UNLABELLED We introduce a new divide and conquer approach to deal with the problem of de novo genome assembly in the presence of ultra-deep sequencing data (i.e. coverage of 1000x or higher). Our proposed meta-assembler Slicembler partitions the input data into optimal-sized 'slices' and uses a standard assembly tool (e.g. Velvet, SPAdes, IDBA_UD and Ray) to assemble each slice individually. Sl...

متن کامل

A Follow-Up of the Multicenter Collaborative Study on HIV-1 Drug Resistance and Tropism Testing Using 454 Ultra Deep Pyrosequencing

BACKGROUND Ultra deep sequencing is of increasing use not only in research but also in diagnostics. For implementation of ultra deep sequencing assays in clinical laboratories for routine diagnostics, intra- and inter-laboratory testing are of the utmost importance. METHODS A multicenter study was conducted to validate an updated assay design for 454 Life Sciences' GS FLX Titanium system targ...

متن کامل

Pathogen detection using short-RNA deep sequencing subtraction and assembly

MOTIVATION Early and accurate detection of human pathogen infection is critical for treatment and therapeutics. Here we describe pathogen identification using short RNA subtraction and assembly (SRSA), a detection method that overcomes the requirement of prior knowledge and culturing of pathogens, by using degraded small RNA and deep sequencing technology. We prove our approach's efficiency thr...

متن کامل

Genetic Heterogeneity of Hepatitis C Virus in Association with Antiviral Therapy Determined by Ultra-Deep Sequencing

BACKGROUND AND AIMS The hepatitis C virus (HCV) invariably shows wide heterogeneity in infected patients, referred to as a quasispecies population. Massive amounts of genetic information due to the abundance of HCV variants could be an obstacle to evaluate the viral genetic heterogeneity in detail. METHODS Using a newly developed massive-parallel ultra-deep sequencing technique, we investigat...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره 30  شماره 

صفحات  -

تاریخ انتشار 2014